Extracting data

Data are extracted via Extraction steps into fields in the Data Model. This topic explains how to do that. Fields can also be filled with other data: the result of a JavaScript or the value of a property. To learn how to do that, see Fields.

Before you start

Data source settings

Data source settings must be made beforehand, not only to make sure that the data is properly read but also to have it organized in a record structure that meets the purpose of the data mapping configuration (see Data source settings). It is important to set the boundaries before starting to extract data, especially transactional data (see Extracting transactional data). Boundaries determine which data blocks - lines, pages, nodes - form a record in the source data. Data that are located in different records cannot be merged into a single record inside the record set that is the result of the extraction workflow.

Preprocessor step

The Preprocessor step allows the application to perform actions on the data file itself before it is handed over to the Data Mapping workflow. In addition, properties can be defined in this step. These properties may be used throughout the extraction workflow. For more information, see Preprocessor step.

Adding an extraction

In an extraction workflow, Extract steps are the pieces that take care of the actual data extractions.
To add an Extract step:

  1. In the Data Viewer pane, select the data that needs to be extracted. (See Selecting data.)

  2. Choose one of two ways to extract the selected data.

    • Right-click on the selected data and select Add Extraction from the contextual menu.

      For optimization purposes, it is better to add data to an existing Extract step than to have a succession of extraction steps. To do that, select that step on the Steps pane first; then right-click on the selected data and choose Add Extract Field.

    • Alternatively, drag & drop the selected fields into the Data Model pane.

      In a PDF or Text file, use the Drag icon to drag selected data into the Data Model.

      With this method, a new Extract step will only be added to the extraction workflow if another Extract step is not currently selected. Otherwise, the field is added to the currently selected Extraction step.
      Dragging data into an existing field in the Data Model will replace the data. The field name stays the same.
      Drop data on empty fields or on the record itself to add new fields.

Special conditions

The Extract step may need to be combined with another type of step to get the desired result.

  • Data can be extracted conditionally with a Condition step or Multiple Conditions step; see Condition step or Multiple Conditions step.

  • Normally the same extraction workflow is automatically applied to all records in the source data. It is however possible to skip records entirely or partially, using an Action step. Add an Action step in a branch under a Condition step or Multiple Conditions step (see Action step) and set the type of action to Stop Processing Record (see Action step properties).

  • To extract transactional data, the Extract step must be placed inside a Repeat step. See Extracting transactional data.

Data cannot be extracted more than once in any record, unless the Extract steps are mutually exclusive. This is the case when they are located in different branches of a Condition step or Multiple Conditions step.

Inside a Detail table, multiple Extract steps may extract the same data but each of them will create a new child record in the Detail table.

If you tick the Append values to current record option when several steps are extracting the same field, the step will error out.

Extracting data into multiple fields

When you select multiple fields in a CSV or tabular data file and extract them simultaneously, they are put into different fields in the Data Model automatically.
In a PDF or Text file, when multiple lines are extracted at the same time, they are by default joined and put into one field in the Data Model. To split them and put the data into different fields:

  1. Select the field in the Data Model that contains the extracted lines.

  2. On the Step properties pane, under Field Definition, click the drop-down next to Split and select Split lines.

Adding fields to an existing Extract step

For optimization purposes, it is better to add fields to an existing Extract step than to have a succession of extraction steps.

To add fields to an existing Extract step:

  1. In the Data Viewer pane, select the data that needs to be extracted. (See Selecting data.)

  2. Select an Extract step on the Steps pane.

  3. Right-click on the data and select Add Extract Field, or drag & drop the data on the Data Model.

When data are dropped on the Data Model, they are by default added to the currently selected Extract step.

Editing fields

After extracting some data, you may want to:

  • Change the names of fields that are included in the extraction.

  • Change the order in which fields are extracted.

  • Set the data type, data format and default value of each field.

  • Modify the extracted data through a script.

  • Delete a field.

All this can be done via the Step properties pane (see Extract step properties), because the fields in the Data Model are seen as properties of an Extract step. See also: Fields.

Testing the extraction workflow

The extraction workflow is always performed on the current record in the data source. When an error is encountered, the extraction workflow stops, and the field on which the error occurred and all subsequent fields will be greyed out. Click the Messages tab (next to the Step properties pane) to see any error messages.

To test the extraction workflow on all records, you can:

  • Click the Validate All Records toolbar button.

  • Select Data > Validate Records in the menu.

If any errors are encountered in one or more records, an error message will be displayed. Errors encountered while performing the extraction workflow on the current record will also be visible on the Messages tab.